LU-Decomposition on a Massively Parallel Transputer System
نویسنده
چکیده
Two algorithms for LU{decomposition on a transputer based reconngurable MIMD parallel computer with distributed memory have been analyzed in view of the interdependence of granularity and execution time. In order to investigate this experimentally, LU{decomposition algorithms have been implemented on a parallel computer, the Parsytec SuperCluster 128. The results of this investigation may be summarized as follows. The LU{decomposition algorithms are very eecient on the parallel computer, if the ratio between problem size and number of processors is not too small. No loss of eeciency is to be expected, if the number of processors is increased only proportionally to the number of elements in the matrix being decomposed. The parallel computer Parsytec SuperCluster 128 (SC 128) is a massively parallel transputer system with distributed memory and 128 processors of type T805 plus some special purpose processors 10]. In the SC 128 the transputers are connected by a statically reconngurable interprocessor network built from 12 Network Connguration Units (NCUs) 10, 11]. Each NCU is a 9696 crossbar switch for link connections. Software running on the SC 128 must be based on message passing. Examplary parallel LU-decomposition algorithms have been implemented on this parallel computer to investigate parallel algorithms 7]. These algorithms decompose a matrix M into a lower triangular matrix L and an upper triangular matrix U with M = L U. One of the matrices L and U is unit upper/lower triangular. The parallel algorithms are derived from the algorithms of Crout and of Doolittle, which diier in the choice of the unit All algorithms described in this paper require a processor network with a torus structure of p p processors. An ideal torus structure occupies all four links of each transputer and allows no entry into the processor network. Therefore, an additional transputer must be inserted into one of the link connections. A free link of this transputer provides the entry into the processor network. During
منابع مشابه
Parallel LU Decomposition on a Transputer Network
A parallel algorithm is derived for LU decomposition with partial pivoting on a local-memory multiprocessor. A general Cartesian data distribution scheme is presented which contains many of the existing distribution schemes as special cases. This scheme is used to prove optimality of toad balance for the grid distribution. Experimental results of an implementation of the algorithm in occam-2 on...
متن کاملA Parallel Matrix Inversion Algorithm on Torus with Adaptive Pivoting
This paper presents a parallel algorithm for matrix inversion on a torus interconnected MIMD-MC multi-processor. This method is faster than the parallel implementations of other widely used methods namely Gauss-Jordan, Gauss-Seidal or LU decomposition based inversion. This new algorithm also introduces a novel technique, called adaptive pivoting, for solving the zero pivot problem at no cost. O...
متن کاملFirst European PVM Users' Group Meeting Rome, Italy, October 9-11 1994 Performance of PVM on a Highly Parallel Transputer System
Although PVM was developed to use a network of heterogeneous UNIX computers as a single large parallel computer, it has become an interface for portable programming even on MPP's. We present PVM performance results for a massively parallel transputer system with up to 512 processors. In comparison to an implementation of the same application in the native transputer operating system Parix, we r...
متن کاملFirst European PVM Users Group Meeting Rome Italy October Performance of PVM on a Highly Parallel Transputer System
Although PVM was developed to use a network of heterogeneous UNIX computers as a single large parallel computer it has become an interface for portable programming even on MPP s We present PVM performance results for a massively parallel transputer system with up to processors In comparison to an implementation of the same application in the native transputer operating system Parix we realized ...
متن کاملPerformance of Pvm on a Highly Parallel Transputer System
Although PVM was developed to use a network of heterogeneous UNIX computers as a single large parallel computer, it has become an interface for portable programming even on MPP's. We present PVM performance results for a massively parallel transputer system with up to 512 processors. In comparison to an implementation of the same application in the native transputer operating system Parix, we r...
متن کامل